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Abstract 

We consider a class of fully stochastic and fully distributed algorithms, that we prove to learn equilibria 
in games. Indeed, we consider a family of stochastic distributed dynamics that we prove to converge 
weakly (in the sense of weak convergence for probabilistic processes) towards their mean-field limit, i.e 
an ordinary differential equation (ODE) in the general case. We focus then on a class of stochastic 
dynamics where this ODE turns out to be related to multipopulation replicator dynamics. Using facts 
known about convergence of this ODE, we discuss the convergence of the initial stochastic dynamics: 
For general games, there might be non-convergence, but when convergence of the ODE holds, considered 
stochastic algorithms converge towards Nash equilibria. For games admitting Lyapunov functions, that 
we call Lyapunov games, the stochastic dynamics converge. We prove that any ordinal potential game, 
and hence any potential game is a Lyapunov game, with a multiaffine Lyapunov function. For Lyapunov 
games with a multiaffine Lyapunov function, we prove that this Lyapunov function is a super-martingale 
over the stochastic dynamics. This leads a way to provide bounds on their time of convergence by 
martingale arguments. This applies in particular for many classes of games that have been considered in 
literature, including several load balancing game scenarios and congestion games. 







1 Introduction 



Consider a scenario where agents learn from their experiments, by small adjustments. This might be for 
example about choosing their telephone companies, or about their portfolio investments. We are interested 
in understanding when the whole market can converge towards rational situations, i.e. Nash equilibria in 
the sense of game theory. This is natural to expect dynamics of adjustments to be stochastic, and fully 
distributed, since we expect agents to adapt their strategies based on their local knowledge of the market, 
and since agents are often involved in games where a global, and hence local, deterministic description of 
the whole global market is not possible. 

Several such dynamics of adjustments have been considered recently in the algorithmic game theory 
literature. Up to our knowledge, this has been done mainly for deterministic dynamics or best-response based 
dynamics: Computing a best response requires a global description of the market. Stochastic variations, 
avoiding a global description, have been considered. However, considered dynamics are somehow rather 
ad-hoc, in order to get efficient convergence time bounds, and still mainly best-response based. We want 
to consider here more general dynamics, and discuss when one may expect convergence. This could lead to 
consider any dynamics which is monotone with respect to the utility of players, in relation with evolutionary 
game theory literature [?] . We propose to restrict here to dynamics that lead to dynamics related to (possibly 
perturbed) replicator dynamics. 

Somehow, as algorithmic game theory can be seen as an algorithmic version of classical game theory, 
our long term aim is to better understand algorithmic evolutionary game theory. Somehow, we could also 
say, that as best-response dynamics can be seen as strategies that visit corners of the simplex of (mixed) 
strategies, we are interested in a long term objective in learning methods that could be seen as interior point 
methods to find equilibria. 

Basic game theory framework. Let [n] = {1, . . . ,n} be the set of players. Every player i has a set 
Si of pure strategies. Let nn be the cardinal of Si. A mixed strategy qi = (qi.i, q%,2, ■ ■ • , Qi,mi) corresponds to 
a probability distribution over pure strategies: pure strategy I is chosen with probability q^e £ [0, 1], with 
Y^T=i Qi,t = 1- Let Ki be the simplex of mixed strategies for player i. Any pure strategy I can be considered 
as mixed strategy eg, where vector eg denotes the unit probability vector with I th component unity, hence 
as a corner of Ki . 

Let K = n™=i b e the space of all mixed strategies. A strategy profile Q = (qi, q n ) £ K specifies the 
(mixed or pure) strategies of all players: qi corresponds to the mixed strategy played by player i. Following 
classical convention, we write often write abusively Q = (qi,Q-i), where Q_i denotes the vector of the 
strategies played by all other players. 

We allow games whose payoffs may be random: we only assume that whenever the strategy profile Q £ K 
is known, each player i gets a random cost of expected value Cj(<Q). In particular, the expected cost for player 
i for playing pure strategy eg is denoted by Ci(eg, Q-i). 

Some classes of games. Several classes of games where players' costs are based on the shared usage of a 
common set of resources [m] = {1, 2, . . . , to} where each resource 1 < r < to has an associated nondecreasing 
cost function denoted by C r : [n] — > M, have been considered in algorithmic game theory literature. 

In load balancing games [?], resources are called machines, and players compete for elements (i.e. singleton 
subsets) of [to]. Hence, the pure strategy space Si of player i having a weight Wi corresponds to [to] or a 
subset of [to], and a pure strategy qi £ Si for player i is some element r £ [to]. The cost for player (task) 
i under profile of pure strategies (assignment) Q = (qi,. .. ,q n ) corresponds to Cj(Q) = C qi (X qi (Q)), where 
X r (Q) is the load of machine r: X r (Q) = J2j- qj = r w j' that ^ s to sa y tnc sum °f tnc weights of the tasks 
running on it. 

In congestion games [?], resources are called edges, and players compete for subsets of [to]. Hence, the 
pure strategy space Si of player i is a subset of 2^ and a pure strategy qi £ Q for player i is a subset of 
[to]. The cost of player i under profile of pure strategies Q corresponds to Ci(Q) = J2 r eq t C r (X r (Q)) where 
X r (Q) is the number of qj with r £ qj. In weighted congestion games, weights (u>i)i are associated to players, 
and one takes instead X r (Q) = J2j. r£qj Wj- 

In task allocation games [?], as in load balancing games, resources are called machines, and players 
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compete for elements (i.e. singleton subsets) of [m] . Each resource (machine) r is assumed to have a function 
C r that takes as input a set of tasks A C [n] assigned to it, and outputs a cost C rj for each participating 
player j. The cost of player i under profile of pure strategies Q is then given by Ci{Q) = C qi .i({j\qj = q%})- 
Functions C r can be considered as speed and scheduling policies, and associated costs as corresponding 
completion time for player (task) i. For example, SPT and LPT are policies that schedule the jobs without 
preemption respectively in order of increasing or decreasing weights (processing times) [?]. 

Clearly, load balancing games are particular task allocation games, and load balancing games are partic- 
ular weighted congestion games. A load balancing game whose weights are unitary is a particular congestion 
game. 

Ordinal and potential games. All these classes of games can be related to ordinal and potential games 
introduced by [?]: A game is an ordinal potential game if there exists some function <fi from pure strategies 
to R such that for all pure strategies Q-i, qi, and q[, one has Ci(qt,Q-i) — Cj(^,Q_j) > iff <ft(qi,Q-i) — 
4>(q'ii Q-i) > 0. It is an an ( exact) potential game if for all pure strategies Q-i, qi, and q[, one has Cj(gj, Q—i) — 
Ci(qi,Q-i) = <j>(qi, Q-i) - <j>{q'i, Q-i)- 

2 Stochastic Learning Algorithms 

Generic Stochastic Learning Algorithm. We want basically to consider learning algorithms of the fol- 
lowing form, over the most possible general games, where & is a parameter, intended to be positive but close 
to 0. 

• Initially, <?i(0) G Ki can be any vector of probability, for all i. 

• At each round t, 

• Any player i: selects a strategy Si(t) € Si according to distribution qi(t): player i selects strategy 
t G Si with probability qi_e(t). This leads to a (random) cost ri(t) for player i. 

• Select some player i(t) at random: player i is selected with probability pi, with ^2^-±Pi = 1. 
This player i = i{t) updates qi(t) as follows: qi(t + 1) = qi(t) + bF^{ri{t), Si(t),qi(t)); 

Any other player keeps gj(t) unchanged: gj(t + 1) = gj(t). 

In a first step, consider functions Ff(ri{t), Siit), q%(t))) as generic as possible, maintaining that the qi(t) 
always stay validity probability vectors: that is to say, qi.e(t) G [0, 1] and J2e *?M W = 1 is preserved. Func- 
tions F^(ri(t), Si(t), qi(t)) can be random (formally a random variable). We only assume that its expectation 
E[ FP(ri(t),Si(t),qi(t)) \Q(t) } is always defined. 

This corresponds indeed to fully distributed algorithms^. Decisions made by players are completely 
decentralized: At each time step, player i only needs Vj, and qi, that is to say respectively her cost and her 
current mixed strategy, to update his own strategy qi. 

Let Q(t) = (<7i(i), q n (t)) £ K denote the state of all players at instant t. Our interest is in the 
asymptotic behavior of Q(t), and its possible convergence to Nash equilibria. Assume that Gi(Q) = 
limb^ E[ F^(n(t),Si(t),qi(t)) \Q } exists and is some continuous function d of Q. 

Results. In the general case (Theorem [TJ, any stochastic algorithm in the considered class con- 
vergesweakly (in the sense of weak convergence for probabilistic processes) towards solutions of initial value 
problem (ordinary differential equation (ODE)) ^ = PiGi{Q), given Q(0), i.e. to its mean-field limit 
approximation. 

This can be seen informally as follows: Assume we replace E[ Aqi(t) \Q(t) ] by Aqi(t) in E[ Aqi(t) \Q(t) ] = 
bpiF^{Q{t)), in the discussion that follows the description of the algorithm, where F^(Q(t)) = E[ F^{ri{t),Si{t),qi(t)) \Q(t) ]. 

1 We of course understand that for some games (like congestion games), the size of the involved probability vectors might 
be non-polynomial. However, by restricting to function F?(ri(t), Si(t), 9i(t))), or close dynamics, which guarantee a support 
of polynomial size for g;(t), can solve the problem: restrict to function which are equal to —bqn for components I outsides a 
polynomial (or fixed) sized support, for example. If this is too problematic to our reader, please consider that we restrict to 
games where the m; stay polynomial, as for load balancing games and task allocation games. 
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Through the change of variable t <— tb, this would become qi(t + b) — qi(t) = bpiF^(Q). Approximating 
qi{t + b) — qi(t) by b-Jr-(i) for small b, we may expect the system to behave like ordinary differential equation 
(ODE) 

^= Pi G i (Q), (1) 

when b is close to 0. 

A replicator-like dynamics F\ is a dynamic where 

F*( n (t), Si (t), ?i (t)) = 7(n(t))(e Si( t) - <&(*)) + 0(6), 

or where this holds for its expectation, where 7 : R — > [0, 1] is some decreasing function with value in [0, 1]. 
Recall that e Si <t) is the unit vector of dimension m, with component number Si(t) unity. 

Notice that we allow perturbed dynamics: 0{b) denotes some perturbation that stay of order of parameter 

b. 

We can also allow randomly perturbed dynamics: a perturbed replicator-like dynamic is of the form 

{T(^i(*))(e Si (t) - qi{t)) with probability a 
6(e„ -*(*)) ^p^ba^ l-a, 

is chosen uniformly, 

where < a < 1 is some constant. 

We claim that such dynamics have a mean-field approximation which is isomorphic to a multipopulation 
replicator dynamics. 

We claim (Theorem , that for general games, if there is convergence of the mean- field approximation, 
then stable limit points will correspond to Nash equilibria of the game. Notice, that there is no reason that 
convergence of mean-field approximation holds for generic games, but if it holds, then its stable limit points 
will be Nash equilibria. 

We claim (Theorem 0]) that ordinal games (and hence (exact) potential games) are Lyapunov games: 
their mean-field limit approximation admits some Lyapunov function. Furthermore, this Lyapunov function, 
that can be taken as the expectation of the potential and is of a special type, that we call multiaffine. 

We show that for Lyapunov games with multiaffine Lyapunov function (hence this includes ordinal and 
(exact) potential games such as load balancing, task allocation and congestion games), the Lyapunov function 
is a super-martingale over stochastic dynamics. 

We deduce results on the convergence of stochastic algorithms for this class. We claim (Theorem [5]) that 
for generic Lyapunov games with multiaffine Lyapunov function, the convergence towards Nash equilibria 
happens in expected time of order taking b of order e. 

Related work. This is clear that an (exact) potential game is an ordinal potential game. Congestion 
games, and hence load balancing games are known to be particular (exact) potential games [?]. Actually, it 
is known that a game is an (exact) potential game iff its is isomorphic to a congestion game [?]. It has been 
proved in [?] that task allocation games are ordinal potential games, for SPT and LPT policies: it is proved 
that one can build some function <j>, which takes values of the form (/].,••• , l n ), that is lexicographically 
decreasing iff a player is doing a best response move. As the U (which corresponds to loads) are bounded by 
some constant K, function (f> = J^i liK n ~ % is decreasing iff a player is doing a best response move. 

In other words, task allocation games under SPT and LPT policies are indeed ordinal potential games, 
under the terminology of [?] . 

An ordinal potential game always have a pure Nash equilibrium: since ordinal potential function, that 
can take only a finite number of values, is strictly decreasing in any sequence of pure strategies strict best 
response moves, such a sequence must be finite and must lead to a Nash equilibrium [?]. This proof of 

2 If we assume all costs to be positive, by linearity of expectation then all costs must be bounded by some constant M, and 
we can take for example y(x) = ■ 
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existence of pure Nash equilibria can be turned into a dynamic: players play in turn, and move to resources 
with a lower cost. 

For load-balancing games, following this idea, bounds on the convergence time of best- response dynamics 
have been investigated in [?]. Since players play in turns, this is often called the Elementary Stepwise System. 
Other results of convergence in this model, have been investigated in [?, ?, ?], but all require some global 
knowledge of the system in order to determine what next move to choose. 

A Stochastic version of best-response dynamics has been investigated in [?, ?]. It is proved to terminate 
in expected (9(loglogn + m 4 ) rounds for uniform tasks, and uniform machines. This has been extended to 
weighted tasks and uniform machines in [?] . The expected time of convergence to an e-Nash equilibrium is 
in (D(nmW 3 e 2 ) where W denotes the maximum weight of any task. 

For congestion games, the problem of finding pure Nash equilibria in congestion games is PLS-complete 
[?] . Efficient convergence of particular best-response dynamics to approximate Nash equilibria in symmetric 
congestion games have been investigated in [?], in the particular case where each resource cost function 
satisfies a bounded jump assumption. In this context, the convergence to e-Nash equilibria occurs within a 
number of steps that is polynomial in the number of players. This has been extended to different classes of 
asymmetric congestion games in [?] . 

All previous discussions are about best-response dynamics. A stochastic dynamic, not elementary stepwise 
like ours, but close to those considered in this paper, has been partially investigated in [?] for general games 
and for potential games: It is proved to be weakly convergent to solutions of a multipopulation replicator 
equation. Some of our arguments follow theirs, but notice that their convergence result (theorem 3.1) is 
incorrect: convergence may happen towards non-Nash (unstable) stationary points. Furthermore, this is not 
clear that any super-martingale argument holds for such dynamics, as our proof relies on the fact that the 
dynamics is elementary stepwise. 

Replicator equations have been deeply studied in evolutionary game theory [?, ?]. Evolutionary game 
theory has been applied to routing problems in the Wardrop traffic model in [?, ?]. Potential games have 
been generalized to continuous player sets in [?] . They have be shown to lead to multipopulation replicator 
equations, and since our dynamics are not about continuous player sets, but lead to similar dynamics, we 
borrow several constructions from [?] . No time convergence discussion is done in [?] . 

A replicator equation for routing games has been considered in [?], where a Lyapunov function is es- 
tablished. The dynamics considered in [?] considers marginal costs. In [?, ?], the replicator dynamics for 
particular allocation games are proved to converge to a pure Nash equilibrium by modifying game costs in 
order to obtain Lyapunov functions. 



3 Mean-Field Approximation For Generic Stochastic Algorithms 

Recall that we are interested in discussing the evolution of Q(t), where Q(t) = (qi(t), ...,q n (t)) <E K denotes 
the state of the player team at instant t in the stochastic algorithm. 

Clearly, Q(t) is an homogeneous Markov chain. Define AQ(t) as AQ(t) = Q(t+ 1) — Q(t), and Aqi(t) as 
Qi(t+ 1) — qi(t). We can write 

E[ A«fc(i) \Q(t) }=b Pl E[ Fftri(t), *(*),<&(*)) \Q(t) ], (2) 

with Gi(Q) = lim^oEf F b (ri(t),Si(t),qi(t)) \Q(t) ] assumed to be continuous under our hypotheses. 

Convergence of the stochastic algorithms towards ordinary differential equations defining their mean-field 
limit approximation can be formalized as follows: Consider the piecewise-linear interpolation Q b {.) of Q(t) 
defined by Q b (t) = Q([t/b\) + (t/b- [t/b\){Q([t/b + lj) - Q([t/b\)). Function Q b (.) belongs to the space of 
all functions from R into K which are right continuous and have left hand limits (cad-lag functions). Now 
consider the sequence {Q b {.) ■ b > 0}. We are interested in the limit Q{.) of this sequence when 6^0. Recall 
that a family of random variable (Yt)teR weakly convergesto a random variable Y, if E[h(X t )] converges to 
E[h(Y)] for each bounded and continuous function h. 
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Theorem 1 The sequence of interpolated processes {Q b (-)} converges weakly, when b — ► 0, to Q{ ), which is 
the (unique deterministic) solution of initial value problem 



=PiGi(Q), i = !,••• ,n, (3) 



with Q(0) = Q b {0). 



To prove the theorem, we will use the following theorem from [?, theorem 11.2.3]. The following presen- 
tation is inspired by the presentation of it in [?, Theorem 5.8, page 96]. 

Suppose that for all integers b > 0, we have an homogeneous Markov chain (YJ ) in R d with transition 
kernel w^(x, dy), meaning that the law of Y^v^, conditioned on Yq , ■ • • , Y^ b \ depends only on and is 
given, for all Borelian B, by E B\Y^ b) ) = tt^(Y^\ B), almost surely. 

Define for x e M. d , 

d (b) (x) = l J(y-x)n^(x,dy), 
a (6) (z) = T l(y-x)(y-xY^ h \x,dy), 



b . 

K^{x)^ \ [(y-x)\W(x,dy), 



b . 

AW(x) = ^ b \x,B(x,ey), 

where B(x,e) c denotes the complement of the ball with radius e, centered at x. 

The coefficients and can be interpreted as the instantaneous drift and the variance (or matrix of 
covariance) of X^ b \ 

Define 

* {b) (*) = r L % + - LWPftUj - 

Theorem 2 ([?, theorem 11.2.3], [?, Theorem 5.8, page 96]) Suppose that there exist some continu- 
ous functions d, b, such that for all R < +oo. 

lim sup\ x \<fi\a^ (x) — a(x)\ = 

b^O ~~ 

lim sup\ x i<fi\S b \x) — d(x)\ = 

6— »0 

limsupi^flA^ = 0,Ve > 
sup < oo. 

\x\<R 

With a a matrix such that o~(x)o~*(x) = a(x), x 6 K. d , we suppose that the stochastic differential equation 

dX(t) = d(X(t))dt + a(X(t))dB(t), X(0) = x, (4) 

has a unique weak solution for all x. This is in particular the case, if it admits a unique strong solution. 

Then for all sequences of initial conditions Yq — > x, the sequence of random processes X<» weakly 
converges to the diffusion given by Equation l[4"]). In other words, for all functions F : C(M + ,IR) — > K 
bounded and continuous, one has 

hmE[F(X^)] = E[F(X)]. 
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Theorem [T] follows from previous theorem. Consider (Y^ b) ) to be 

n (fc) = (Q(*0) 

with the corresponding b, which is indeed an homogeneous Markov chain. Let Tr^ b '(Q,dy) be its transition 
kernel. 



We have 



and 



4\q) = if(xi-<b)* (b) (*>dv) 
= In A?, \q \ 

— > PiGi(Q) when 6 



VE[ PiPjAqiAqj \Q } 
= 0(b) 

— > when 6^0 
In the same vein, clearly K^(Q) stay bounded, being in C(6 2 ). 

Now, from the fact that compact K must be kept invariant by the dynamics, iy (.) must have a compact 
support. This means that TT^ b \Q,B(Q,e) c ) is for 6 sufficiently small. Hence lim^o sup\ x \<rA^ = 0, 
Ve>0. 

Hence, we have all the hypotheses of previous theorem with a(Q) = and 

d(Q) = (p 1 G 1 (Q),--- ,p„G n (Q)), 

observing that the corresponding stochastic differential equation dQ(t) = d(Q(i))dt + cr(Q(t))dB(t) turns out 
to be an ordinary differential equation, whose solution is unique by (classical) Cauchy Lipschtiz theorem. 



4 General Games and Replicator-Like Dynamics 

From now on, we restrict to (possibly perturbed) replicator-like dynamics, as defined in page [31 
For replicator-like dynamics set a = 1 in what follows. 

For replicator-like dynamics and perturbed replicator-like dynamics, the one-step dynamics of the stochas- 
tic algorithm can be rewritten componentwise: 



+0(b) if i^i(t) 

Aq hl (t) = qi/ (t + 1) - q iti (t) = a { -tr/{r 2 {t))q hl {t) +0(6) if i = i(t) and Si (t) ± I 

-b~f(n(t))q tie (t) + &(7(r 4 (t))) +0(b) if i = i(t) and s<(t) = I, 

and we have 

Gi(Q) = lim^o £E[ Aq i4 (t) \Q(t) } 

= Um^olE.-ft.iWlEl Aqu(t) \Q(t), Si (t)=j,i(t)=i] 
= +aY J3 ^A t )^A t Ml{n{t)) \Q(t),Si(t)=e,i(t) = i}) 
-aEifcj(*)(ft,<(*) E [7(ri(t)) \Q(t),Si(t)=j,i(t)=i]) 
= (j M (E[ 7 (r,(t)) \Q(t), Si (t)=e,i(t) =i ] -E[ 7 (r i (t)) |Q(t),t(f) = i ]). 

that is to say, if we introduce Ui(Q) = E[ — ^j(ri(Q)) \Q ] for all Q, then Equation ([3]) leads to dynamics, 
by Theorem [T] 

This ordinary differential equation turns out to be (a rescaling of) (multipopulation) classical replicator 
dynamic 

= ~PiqiA u ii e ^ Q-i) - Ui{qi, Q-i)), (5) 
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whose limit points are related to Nash equilibria (through so-called Folk's theorems of evolutionary game 
theory [?]). 

Here, Ui(Q) is taken as Mj(Q) = E[7(r»(Q)) \Q] for replicator-like dynamics, andwi(Q) = E[ —j(r i (Q)) \Q] 
for perturbed replicator-like dynamics. The game whose costs are defined by Ui is clearly isomorphic to the 
original game. Notice that when 7 is affine, this is just introducing a(n other) rescaling in (JSJ). 

Using properties of dynamics (0, we get: 

Theorem 3 For general games, for any replicator-like or perturbed replicator-like dynamic, the sequence of 
interpolated processes {Q b (-)} converges weakly, as b — > 0, to the unique deterministic solution of (O with 
Q(0) = Q b (0). If the mean-field approximation dynamic ([5]) converges, its stable limit points correspond to 
Nash equilibria of the game. 

More precisely, we have: 

Proposition 1 The following are true for the solutions of Equation (i) All Nash equilibria are stationary 
points, (ii) All stable stationary points are Nash equilibria. (Hi) However, (unstable) stationary points can 
include some Non-Nash equilibria. 

The following are well-known (and obtained by just playing with definitions). 

Lemma 1 A strategy profile Q is a Nash Equilibrium iff Ui(qi,Q-i) < Uj(e^, Q-i) for all 1 < i < n, 
1 < I < m,. 

Corollary 1 In a Nash Equilibrium, we have Ui(qi,Q-i) = Ui(eg, Q-i) for all 1 < i < n, 1 < t < nii with 
qi,i > 0. 

Proposition [1] is then an instance of the so-called folk-theorems of Evolutionary Game Theory [?] . For 
completeness, the proof goes as follows: From Corollary [TJ clearly any Nash equilibria must also vanish the 
right-hand side of Equation ((U . 

A non-Nash equilibrium Q is not stable: Indeed, if Q is not a Nash equilibrium, this means that for some 
i, and some £ we have Ui(qi, Q-i) > Ui(e£, Q-i). By bilinearity and continuity of Ui, function Ui(qi — ei, Q-i) 
must be strictly positive (say greater than e) on some neighborhood of Q. On this neighborhood, is 
greater than p^g^e, and hence the point is left exponentially faster (faster than exponential qi,e(0) expfaet)). 

In a corner of K, we have for all i, qi — eg for some I. Then clearly qui — for index £' ^ £, and 
Ui(et, Q-i) — Ui(qi, Q-i) = for index I' — £'. Hence, the right-hand side of Equation ([5|) is always null, and 
hence any corner is a stationary point. 

More generally any state Q in which all strategies in its support perform equally well, is clearly a stationary 
point from the definition of the dynamic. 

Actually, all corners of simplex K are stationary points, as well as, from the form of ([5]), more generally 
any state Q in which all strategies in its support perform equally well. Such a state Q is not a Nash 
equilibrium as soon as there is an not used strategy (i.e. outside of the support) that performs better. 

Unstable limit stationary points may exist for the mean- field approximation (|5|) : Consider for example a 
dynamics that leave on some face of K where some well-performing strategy is never used. To avoid "bad" 
(non-Nash equilibrium, hence unstable) stationary points, following the idea of penalty functions for interior 
point methods, one can use as in Appendix A. 3 of [?] some patches on the dynamics that would guarantee 
Non-complacency. Non- Complacency (NC) is the following property: G(Q) = implies that Q is a Nash 
equilibrium (|5|) (i.e. stationarity implies Nash). 

This can be thought as the price to pay for purely deterministic model^l, and actually, when dealing with 
stochastic dynamics, all this can be avoided by taking profit of the unstability of non-Nash stationary points: 
this is the idea behind the randomized replicator dynamics already defined. This guaranteed unstable points 
to be left almost-surely by the associated stochastic algorithm: technically, this ensures ergodicity of the 

3 And perhaps somehow as artifacts of modeling. 
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underlying Markov Chain. Notice that a purely deterministic replicator-like dynamics where 0(b) = is 
not: an unstable stationary point, like a corner of K is invariant for ever, and the underlying Markov is 
hence not irreducible. 

For general games, we get that the limit for b — > is some ordinary differential equation whose stable 
limit points, when t — ► oo, IF there exist, can only be Nash equilibria. Hence, IF there is convergence of 
the ordinary differential equation, then one expects the previous stochastic algorithms to learn equilibria. 

Observe, that roughly speaking, for non-degenerated games, learning interior (hence mixed) Nash equilib- 
ria by such method is often problematic (and hence practically only pure Nash equilibrium may be learned) 
since the following is known: 

Proposition 2 ([?, ?],[?, page 218] ) If a closed set X C K belongs to the relative interior of some face 
of K , then X is not asymptotically stable by dynamics ([5]). 



5 Lyapunov Games, Ordinal and Potential Games 

Since general games have no reason to converge, we propose now to restrict to games for which replicator 
equation dynamic or more generally general dynamics ([3]) is provably convergent. As this practically often 
relies on some Lyapunov function argument, we propose the following terminology. 

Definition 1 (Lyapunov Game) We say that a game has a Lyapunov function (with respect to a par- 
ticular dynamic ([3]) over K), or that the game is Lyapunov, if there exists some non-negative C 1 function 
F : K — > K such that for all i, I and Q, whenever G(Q) ^ 0, 

BF 

T / p i -^—(Q)G i 4Q)<0. (6) 

Lyapunov games include ordinal potential (and hence (exact) potential) games: we will say that a Lya- 
punov function F : K — > R is multiaffinc, if it is defined as as polynomial in all its variables, it is of degree 
1 in each variable, and none of its monomials are of the form qi^qi^i. 

Theorem 4 An ordinal potential game is a Lyapunov game with respect to dynamics ([5]) . Furthermore, its 
has some multiaffine Lyapunov function. 

Proof: Consider F(Q) = E[ <fi(Q) | players play pure strategies according to probability distribution Q ] 
where <fi is the potential of the ordinal potential game. By linearity of expectation, F(Q) is clearly multiaffine. 

Now, by linearity of expectation, we have that F(qi, Q-i) = ^2eQi,iF(ei)Q—i), and hence ^ (Q) = 
F(ei,Q-i). Now, for dynamics (jSJ, left-hand side of © rewrites to 

T,i,ePi$rj(Q) G i,i(Q) = -Y,iPiY,e F ( e e>Q-i)<liA u i( e £>Q-i) -Ui(<li,Q-i)) 

= ~J2iPi Y,t Y.t> Qi,eQi,e' F {ee, Q-»)(«»(e<, Q-i) - Ui(e t >,Q-i)) 

= ~~iJ2iPi J2e<e> Qi,e%e'{F( e iiQ-i) ~ F(et>,Q-i))(ui(et, Q-i) ~ u l {e t ,Q_ i )) 

Since the game is ordinal, (F(ee,Q—i) — F(ee> ,Q-i))(ui(ee,Q-i) — u»(e^/, Q_»)) is always non-negative, by 
definition, and hence F is a Lyapunov function. □ 

More precisely, if 4> is the potential of the ordinal potential game, then one can take its expectation 
F(Q) — E[ 4>(Q) ] — E[ 4>{Q) | players play pure strategies according to Q ]as a Lyapunov function with 
respect to dynamics J5J. 

The following class of games have been introduced 

Definition 2 (Potential Game [?]) A game is called a continuous potential game if there exists a C 1 
function F : K — > K such that for all i,£ and Q, 

° F (Q) = Ui(e*,Q). (7) 



Oq 
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Proposition 3 A continuous potential game is a Lyapunov game with respect to dynamics ([5]). Furthermore, 
its has some multiaffine Lyapunov function. 



Proof: 

By definition, F has a multiaffine Lyapunov function: this is clear as all its partial derivative are known, 
given by ^j(Q) = Cj(e*, Q). 

Now, in this case, for dynamics (J3J), left-hand side of {BJ rewrites to 

Y,i,tPi-£^{Q) G i,t(Q) = -Y,iPiY,t'Ui{ e liQ-i)gi,e( u i( e tiQ-i) -Ui{qi,Q-i)) 

= ~12iPi HiHt Qi,tQi,t'V>i(ee, Q-i)(ui(ei, Q_i) - u t {ei>,Q^i)) 
= ~kJ2iPiJ2i<i' Qi,eqi,e'(ui(ee,Q-i) -Ui(e^,Q_j)) 2 



hence is positive on non-stationary points. □ 
Recall that exact potential games have been defined page [2j following [?], in terms of pure strategies. 
Notions turn out to be equivalentwhen F is assumed at least C 2 . 

Proposition 4 An (exact) potential game of potential <fi leads to a continuous potential game with F(Q) = 
E[ <p(Q) ], and conversely, the restriction of F of class C 2 to pure strategies of a potential in the sense of 
above definition leads to an (exact) potential. 

Proof: 

In other words, a game is a continuous potential game if there exists some C 1 function whose gradient 
V/ equals the cost vector H — («j(e;, Q))i,i- Function F, which is unique up to an additive constant, is 
called the potential function of the game. 

When F is C 2 , condition is equivalent to externality symmetry [?, ?]: 

dui(et,Q) duj(e' e ,Q) 

— 5 _ a ' <-°J 

oqj,e< oq i: i 

for all i,j,£,£'. In that case, by a well-known result (characterization of exact forms), if we fix any z 6 K, 
F is given by 

n mi />1 

f(Q)-EE/ Ui{e t ,x(t)>i(t)dt, (9) 

where x : [0, 1] — ► K is any piecewise continuous differentiable path in K that connects z to Q (i.e. x(0) = z, 
x(l) =Q). 

In particular it must be independent of the used path. Considering paths from pure strategies to pure 
strategies, the second part of the proposition follows, from characterizations of (exact) potential games in [?]. 
The first part of the proposition is easy to establish, in the same vein as we established ^(Q) = F(e e , Q_ 4 ) 
in the proof of Theorem [4] above. □ 

A Lyapunov game can have some non-multiaffine potential function, hence not all Lyapunov games with 
respect to dynamics ([5]) are ordinal games. We believe Lyapunov game with respect to dynamics ([5]) with a 
multiaffine potential function to differ from ordinal games. 

The interest of Lyapunov functions is that they provide convergence. Recall that the cj(Qq) limit set of 
a point Qq is the set of accumulation points of the trajectories that start from Q : considering a trajectory 
starting from Qq, this is the set of Q* with Q* = linin^oo Q(t n ), for some increasing sequence {t n ) n >o £ R. 

Proposition 5 In any Lyapunov game with respect to any dynamic ([3]) over K , the solutions of mean- 
field approximation (|3|) have their limit set lo(Q) non-empty, compact, connected, and consisting entirely of 
stationary points of the dynamic. On this limit sets, F is constant. 

Proof: This is made of well-known fact, and is for example present for example as Lemma A.l of [?]. 

For self-contentedness, here is mainly a slight adaptation of the proof of Lyapunov Stability theorem [?, 
page 194]. 
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F(Q(t)) must be monotone along trajectories, since Equation (|6|) guarantees — ^ < 0. Let Q(t) be 
some solution of ordinary differential equation © with Q(0) — x. Let Qo £ w(a;): that is to say Q(t n ) — > Qo 
for some sequence i n — > oo. We claim that Qo must be some stationary point of the dynamics, that is to 
say, G(Qo) = 0. To see this, observe that F(Q(t)) > F(Q ) since F(Q(t)) decreases and F(Q(tj) converges 
to -F(Qo) by continuity of F. 

Suppose that G(Qq) =/= 0. Let Z(t) be the solution of the ordinary differential equation starting from 
Qo- For any s > 0, we have F(Z(sj) < F(Q ). Hence, for any solution Y(s) starting sufficiently near 
Zq we have F(Y(s)) < F(Qq). Setting Y(0) = Q(t n ) for sufficiently large n yields the contradiction 
F(Q(t n + a)) < F(Q ). Therefore, G(Q ) = 0. 

This proves that any limit set must be non-empty and consisting entirely of stationary point of the 
dynamics. 

By continuity of F, F(Qq) = linin^oo F(Q(t n )) for any limit point Qq. Now this must be equal to 
inft(F(Q(t))) and hence independent of Q . 

The subset u(x) of limit points Qo, being equal to r)tClosure(F(s > t)), hence a decreasing intersection 
of compact connected sets must be compact and connected. 

□ 

Observing that all previous classes are Lyapunov games with respect to dynamics ([5]), this gives the full 
interest of this corollary. 

Corollary 2 In a Lyapunov game with respect to general dynamics whatever the initial condition is, 
the solutions of mean-field approximation ([3]) will converge. The stable limit points are Nash equilibria. 

If mean-field approximation Q has the (NC) property, then this guarantees that limit points are Nash 
equilibria. Otherwise, unstable limit stationary may exist for the mean-field approximation. 



6 Replicator-Like Dynamics for MultiafRne Lyapunov Games 

Fortunately, this is possible to go further, observing that many of the previous classes (ordinal, (exact) 
potential, continuous potential, load balancing games, congestion games, task allocation games) turn out by 
previous discussion to have a multiaffine Lyapunov function. 

When this holds, this is indeed possible to talk directly about the stochastic algorithms, avoiding passage 
through ordinary differential equation Q, and the double limit b — * 0, t — > oo. The key observation is the 
following (the proof mainly relies on the fact that second order terms are null for multiaffine functions). 

Lemma 2 When F is a multiaffine Lyapunov function, 

n mi r\ y-, 

E[ AF(Q(t + 1)) \Q(t) ] = jr—mmi A 9M \Q(t) }, (10) 



where AF(t) = F(Q(t + 1)) - F(Q(t)). 
Proof: 

Let us denote R(Q, A) = F(Q + A) - F(Q) - 
by definition taking A = AQ(t), we have AF(t) — 
R(Q,AQ(t)). 

We then have 



J^iLi wnen A is a vector, so that 

F(Q(t + 1)) -F(Q(t)) = Er=iE"l\ ^(Q)A«m + 



E[ AF(t) \Q(t)] = J2J2lT- (QM A 1U \Q(t)]+E[R(Q,AQ(t)) \Q(t) }. 
i=i i=i aq ^ e 

It only remains to prove that E[ i?(Q, AQ(t)) \Q(t) ] — when F is multiaffine. 
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A multiaffine function F is particular polynomial function, of degree 1 in each variable. By definition, 
R(Q,AQ) is hence also a polynomial function, of degree 1 in each variable AQ^. By construction, it 
has no-constant term, and no monomial of the form fy^AQi^. Hence, all its monomials are of the form 
Pijj^AQi^AQj^it), with (ij) £ (j,f). 

By definition of multiaffine function used in this paper, there can not be terms AQij(t)AQjji (t) with 
i — j among these monomials. 

Observe that AQi^(t) AQj^i (t) = for i ^ j: indeed, at any time t, at most one player moves in the 
considered class of algorithms: in other words, we use the fact that considered algorithms are elementary 
stepwise. □ 

When considering a Lyapunov game with respect to replicator-like dynamics, using Equation @ and the 
fact that Gi(Q) = limf,^ F^(Q) the right hand side of Equation (TT0|) is 

n mi r\ j-! 

6 EE^|— (Q)GiAQ) + o(b 2 ), (n) 

i=i e=i 

and hence expected to be negative by Equation ^ when G(Q) ^ and b is sufficiently small. 

In other words, when b is small, (F(Q(t)) t will be a super-martingale until reaching a point where (fTTj) 
is close to 0. 

More precisely, for a replicator-like dynamics, Equation (fTTj) rewrites to 

~ h \ Z^2 Pi E Qi,t<li,t'( u i( e t' Q-i) - Ui(ei>,Q- t )) 2 + 0(b 2 ). 

As expected, on corners of K, this is expected to be close to 0, and hence not (neccesarily) a super- 
martingale. 

For the perturbed replicator-like dynamics, taking the perturbation 0(b) in page [3] to be 0, Equation 
(fTT|) rewrites to 

~~ ba A^2 Pt E H,Z<li,Z'( U i( e ti Q~i) ~ u i( e t' > Q-i)) 2 + b 2 (l -«)^^ ^ Qi,i)- 

which can be written 

~ ha \ Z^2 Pl E %^1i,£'{ui(ee, Q-i) - Ui(e t ,Q-i)) 2 + 0(b 2 ). 

When talking about stochastic perturbed dynamics, using this super-martingale argument, one gets the 
following stability result: we write for the subset of states Q on which F(Q) < [i. 

Proposition 6 Let A > 1. Let Q(to) be some state. Consider b enough small so that () 1 0(1 is non-positive 
outside of L(F(Q(0))). Then Q(t) will be such that Q(t) G L(\F(Q(to))) forever after time t > to with a 
probability greater than 1 — j- . 

Proof: Consider sequence Z n — maxt< n F(Q(t)) and T% the sigma-algebra generated by [Q(j))j<i, and 
apply Proposition [8] for A' = AE[ Zq ]: 

P\Vn,F(Q(n)) < XF(Q(0)} = P[supZ n > A'] < = ± 

n A A 

□ 

If dynamic is perturbed, then the underlying Markov chain is ergodic. It follows that any neighborhood 
is visited with a positive probability: a dynamic will be said perturbed if for all Q G K , for any neighborhood 
V with Q in its closure, the probability that Q(t + 1) S V when Q(t) — Q is positive. 
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Then if in some neighborhood of such a point we can apply previous proposition, one would get that 
almost surely, after some time, Q(t) will be close to some Nash equilibria forever with high probability. The 
default of such an approach is clearly on the fact that it docs not provide bounds on the time required to 
reach such a neighborhood. 

Notice that for Lyapunov game with a multiafhne Lyapunov function F, with respect to Dynamic (O 
(this include ordinal, and hence potential games from above discussion), the points Q* realizing the minimum 
value F* of F over compact K must correspond to Nash equilibria. 

Fortunately, this is possible to get bounds on the expected time of convergence: we write L(ff) for the 
subset of states Q on which F(Q) < \x. 

Definition 3 (e-Nash equilibrium) Let e > 0. A state Q is some e-Nash equilibrium iff for all 1 < i < 

n, 1 < £ < nii, we have Ui{et, Q-i) > (1 - e)u l {q l , Q-i). 

If one prefers, in an e-Nash equilibrium, no player can improve its situation by more than e times its 
current cost by changing unilaterally its strategy. 

In a non e-Nash equilibrium, we have some i and £, with Ui(eg, Q-i) < (1 — e)ui(qi, Q-i). This means, 
u t (qi - et, Q-i) > eu^qi.Q-j). 

For the perturbed replicator-like dynamics, taking the perturbation 0(b) to be in the definition of this 
dynamics, we have 

E[ Aq i e \Q(t) } = -abpiq it e(ui(ee, Q-i) - u i (q i ,Q- l )) + b 2 (l - a)(— q i e ), 

rrii 

Assume without loss of generality that all costs are greater than 1. Let Q = PiUi(qi — eg, Q-i) and 
(3 = 1 — a. Previous equation is of the form b(aqi^( + b/3(— — qi : i)), hence some strictly increasing function 
of qi t £ as soon as b < jjPieu^qi, Q-i) and C > Pi^Ui(qi,Q-i). In that case, its minimal value, obtained for 

g M = 0is* = Sf. 

So, as soon as £ > pieui(qi, Q-i), that is to say Ui(qi—ei, Q-i) > eu^qi, Q-i), we will have E[ Aqi,£ \Q(t) ] > 
5, that implies E[ q t j(t + 1) \Q(t) ] > 5. 

This implies that the opposite of E[ AF(Q(t + 1)) \Q(t) ] will be greater than 

V = ba-piS^qi^iuiie^Q-i) - Ui{e t , Q- t )f + 0(b 2 ). 

Taking b < (1 — p)^pitUi(qi, Q-i) for any /i > guarantees that the factor in g,^ in previous discussed 
expression is greater than fiC a , an d hence that its iterations growth exponentially fast near 0. Reasoning by 
sequences of k steps, i.e. about the opposite of E[ AF(Q(t + k)) \Q(t) ], will greater than a term of order 

V = ba^p l ^ qi,i'{ui(ei,Q-i) - u;(e^ , Q -f)) 2 

in a non-e-Nash equilibrium. 

Theorem 5 Consider a Lyapunov game with a multiaffine Lyapunov function F , with respect to ([5]) . This 
includes ordinal, and hence potential games from above discussion. Taking b — 0(e), whatever the initial 
state of the stochastic algorithm is, it will almost surely reach some e-Nash equilibrium. Furthermore, it will 
do it in a random time whose expectation T(e) satisfies 

T(e)<0(Ml). 

e 

Proof: 

Consider V* — min^ Qi,e'( u i{ e e, Q-i) — itj(e^/ ,Q-i)) 2 ■ Let /(e) denote the states where the 

righthand side of Equation (|10[) is greater than —bctj miniPiV*e. 
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If the initial state is already e-stable then there is nothing to prove. 

Otherwise, this follows from the analysis before Theorem[5j and from proposition with Zi — F(Q(i)), 
Ti the sigma-algebra generated by (Q(j))j<i, C = fi, K = 1(e): indeed, whenever Q(t) ^ 1(e) U L(fi), 
this implies r > t, and we have E[ AF(t) \Q(t) ] = E[ Z t+ x - Z t \T t ] < -eO(b). In all other cases, 
E[ Z n+ i \T n ] = Z n and hence all the hypotheses of Proposition [7] are satisfied. □ 

We believe these bounds are tight for generic ordinal games. The point is that in arbitrary ordinal games, 
there is no necessarily relation between the gain in utility and the gain in potential: only sign of variation 
must be preserved. 

Of course better bounds can be hoped for particular games, and in particular for congestion games. 
For generic congestion games, there is a strong relation between the potential and utilities of players. In 
congestion games, using notations from page[TJ the potential is given by F(Q) = E[ YlT=i ^2t=i^ C r (t) ]■ 
One has in particular F(Q) < E[ Yh=i u i( c h Q) ]■> smce c i(Q) = J2re Qi C r (K(Q))- 

In particular, following [?], a congestion game is said to satisfy the a-bounded jump condition if its cost 
functions satisfy C r (t + 1) < aC r (t) for all t > 1. This ensures the following property for S = (see [?]): 
whenever Q is not an e-Nash equilibrium, then for at least a player i, the relative cost of adopting some pure 
strategy £ would induce a gain at least (5 times the resulting gain in potential. 

We believe perturbed replicator-like dynamics to converge very fast (hence in polynomially many steps) 
on such games. 
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A Results About Semi-Martingales 

Let {Zi,i > 0} be a sequence of real non-negative random variables, such that Zi is measurable in the 
increasing family of sigma- algebra Ti- 

Proposition 7 (proof similar to [?, Theorem 2.1.1, page 17]) Assume that Zq is constant. Denote 
by t the T n -stopping time representing the epoch of the first entry into [0, C] or in some measurable subset 
K, for C > 0, i.e. t{uj) = inf{n > l\Z n {uo) < C V Z n {uS) G K}. Introduce the stopped sequence 

Z'n ^nAT) 

where 

n, if n < t 
t, if n > t 



n At 

We use the classical notation for the indicator function 1^ : 



1, if A is true 
0, otherwise 

Assume Zq > C, and for some e > and all n>0, 

E[ Z n+ i \T n ] < Z n — el r >„, almost surely. 

Then r is almost surely finite and 

E[ r ] < -j- < oo. 

Proposition 8 ( [?, Theorem 3.2, Chapter 7]) Assume that for all n, E[ Z n+ \ — Z n \T n ] < 0. Then 
for all A' ■ 0. 

P[supZ„>A']<^i. 
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